Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#992: added design doc for script options #477

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

tomuben
Copy link
Collaborator

@tomuben tomuben commented Nov 15, 2024

@tomuben tomuben force-pushed the doc/992_add_design_doc_for_script_options branch from 3522853 to 0a2684d Compare November 15, 2024 14:59
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you create these? If it is drawio, please use the extension drawio.png otherwise it is not clear how to change them and the tooling won't recognize them.


#### Legacy Parser

The legacy parser (V1) parser searches for one specific script option. The parser starts from beginning of the script code. If found, the parser immediately removes the script option from the script code and returns the option value. It validates the
Copy link
Collaborator

@tkilias tkilias Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a narrow interpretation of arc42, the removal would be part of the runtime view and not the building blocks.


Tags: V2

### General Parser Integration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the DB

@tomuben tomuben force-pushed the doc/992_add_design_doc_for_script_options branch from b6dca79 to 207e27f Compare November 22, 2024 16:15
@@ -0,0 +1,27 @@
#!/bin/bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at some Point replace this with a nox session, please create an issue

## Constraints

- The parser implementation must be in C++.
- The chosen parser implementation is [ctpg](https://github.com/peter-winter/ctpg), which supports the definition of Lexer and Parser Rules in C++ code.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a constraint, but a decision that follows from the constraints


![Components](diagrams/OveralScriptOptionalsBuildingBlocks.drawio.png)

At the very high level there can be distinguished between the generic "Script Options Parser" module which parses a UDF script code and returns the found script options, and the "Script Options Parser Handler" which converts the Java UDF specific script options. In both modules there are specific implementation for the legacy parser and the new CTPG based parser.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add reason, why we need to implementation


### Script Options Parser

The parser component can be used to parse any script code (Java, Python, R) for any script options. It provides simplistic interfaces, which are different between the two versions, which accept the script code as input and return the found script option(s).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add reason, why the interfaces are different. Here, they work inherently different


As the parser needs to find script options in any given script code, the generated parser must accept any strings which are not script options and ignore those. In order to achieve this, the lexer rules need to be as simple as possible, in order to avoid collisions.

It is important to emphasize that in contrast to the legacy parser, the caller is responsible for removing the script options from the script code.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a good reason?


![LegacyParserHandler](diagrams/LegacyParserHandler.drawio.png)

The `ScriptOptionsLinesParserLegacy` class uses the Parser to search for Java specific script options and forwards the found options to class `ConverterLegacy`, which uses a common implementation for the conversion of the options.
Copy link
Collaborator

@tkilias tkilias Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `ScriptOptionsLinesParserLegacy` class uses the Parser to search for Java specific script options and forwards the found options to class `ConverterLegacy`, which uses a common implementation for the conversion of the options.
The `ScriptOptionsLinesParserLegacy` class uses the Parser to search for Java specific script options and forwards the found options to the class `ConverterLegacy`, which uses a common implementation for the conversion of the options.

common implementation is misleading, because it suggests both handler use it


![CTPGParserHandler](diagrams/CTPGParserHandler.drawio.png)

The `ScriptOptionsLinesParserCTPG` class uses the new CTPG basedParser to search for **all** Java specific script options at once. Then it forwards the found options to class `ConverterV2`, which uses a common implementation for the conversion of the options. `ConverterV2` also implements the functions to convert Jvm otions and JAR options.
Copy link
Collaborator

@tkilias tkilias Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `ScriptOptionsLinesParserCTPG` class uses the new CTPG basedParser to search for **all** Java specific script options at once. Then it forwards the found options to class `ConverterV2`, which uses a common implementation for the conversion of the options. `ConverterV2` also implements the functions to convert Jvm otions and JAR options.
The `ScriptOptionsLinesParserCTPG` class uses the new CTPG based Parser to search for **all** Java specific script options at once. Then it forwards the found options to class `ConverterV2`, which uses a common implementation for the conversion of the options. `ConverterV2` also implements the functions to convert Jvm otions and JAR options.

common again

`ScriptOptionsLinesParserCTPG` uses an instance of `ScriptImporter` to import foreign scripts. Because the new parser collects all script options at once, but backwards compatibility with existing UDF scripts must be ensured, there is an additional level of complexity in the import script algorithm. The algorithm is described in the following pseudocode snippet:
```
function import(script_code, options_map)
import_option = options_map.find("import")
Copy link
Collaborator

@tkilias tkilias Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import_option = options_map.find("import")
import_options = options_map.find("import")

Should this be plural?

static void doSomething() {}
}
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the expected results


### Parser Implementation V1

The legacy parser (V1) parser searches for one specific script option. The parser starts from beginning of the script code. If found, the parser immediately removes the script option from the script code and returns the option value. It validates the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The legacy parser (V1) parser searches for one specific script option. The parser starts from beginning of the script code. If found, the parser immediately removes the script option from the script code and returns the option value. It validates the
The legacy parser (V1) parser searches for one specific script option. The parser starts from the beginning of the script code. If found, the parser immediately removes the script option from the script code and returns the option value. It validates the

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of the sentence is missing

### Lexer and Parser Rules Option
`dsn~lexer-parser-rules~1`

Lexer and Parser rules to recognize `%optionKey`, `optionValue`, with whitespace characters as separator. The Parser rules will define the grammar to correctly identify Script Options, manage multiple options with the same key, and handle duplicates.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention simple lexer rules to deal with whitespace and escaping correct

### Ignore lines without script options
`dsn~ignore-lines-without-script-options~1`

In order to avoid lower performance compared to the old implementation, the parser must run only on lines which contain a `%` character.
Copy link
Collaborator

@tkilias tkilias Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In order to avoid lower performance compared to the old implementation, the parser must run only on lines which contain a `%` character.
In order to avoid lower performance compared to the old implementation, the parser must run only on lines which contain a `%` character after only whitespaces.


Tags: V2

### Lexer and Parser Rules Whitespace Sequences
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to continue here

- '\f' => <form feed> character
- '\v' => <vertical tab> character

Implement rules which replace those white space escape tokens only at the beginning of an option value. Add parser rules to ignore those tokens in anything else, which is not a script option. Those token are not expected to be part of an option key.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Implement rules which replace those white space escape tokens only at the beginning of an option value. Add parser rules to ignore those tokens in anything else, which is not a script option. Those token are not expected to be part of an option key.
Implement rules which replace those white space escape tokens only at the beginning of an option value. Add parser rules to leave those tokens as is in anything else, which is not a script option. Those token are not expected to be part of an option key.

### Lexer and Parser Rules Escape Sequences
`dsn~lexer-parser-rules-escape-sequences~1`

Define the Lexer rules to tokenize '\n', '\r', '\; sequences, and parser rules to replace those sequences with <line feed>, <carriage return> or ';' characters.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Define the Lexer rules to tokenize '\n', '\r', '\; sequences, and parser rules to replace those sequences with <line feed>, <carriage return> or ';' characters.
Define the Lexer rules to tokenize '\n', '\r', '\;' sequences, and parser rules to replace those sequences with <line feed>, <carriage return> or ';' characters.

`dsn~lexer-parser-rules-escape-sequences~1`

Define the Lexer rules to tokenize '\n', '\r', '\; sequences, and parser rules to replace those sequences with <line feed>, <carriage return> or ';' characters.
Implement rules which replace those token at any location in an option value. Add parser rules to ignore those tokens in anything else, which is not a script option. Those token are not expected to be part of an option key.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Implement rules which replace those token at any location in an option value. Add parser rules to ignore those tokens in anything else, which is not a script option. Those token are not expected to be part of an option key.
Implement rules which replace those token at any location in an option value. Add parser rules to leave those tokens as is in anything else, which is not a script option. Those token are not expected to be part of an option key.

### Script Option Removal Mechanism
`dsn~script-option-removal~1`

Implement a method in class `ScriptOptionLinesParserCTPG` which removes *all* identified Script Options from the original script code at once. This method be executed after the import scripts are replaced. The algorithm must replace the script options in reverse order in order to maintain consistency of internal list of positions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Implement a method in class `ScriptOptionLinesParserCTPG` which removes *all* identified Script Options from the original script code at once. This method be executed after the import scripts are replaced. The algorithm must replace the script options in reverse order in order to maintain consistency of internal list of positions.
Implement a method in class `ScriptOptionLinesParserCTPG` which removes *all* identified Script Options from the original script code at once. This method should be executed after the import scripts are replaced. The algorithm must replace the script options in reverse order in order to maintain consistency of internal list of positions.

### Unknown Options Check
`dsn~script-option-unknown-options-behvaior-v2~1`

Implement a check in class `ScriptOptionLinesParserCTPG` which verifies that only known script options are found. Otherwise, raise an exception with information about the position and name of the unknown option.
Copy link
Collaborator

@tkilias tkilias Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, after a second thought this gets nasty, because the supported options depend on the handlers, but that a handler doesn't support certain options doesn't mean they are illegal, it is still possible that a subsequent handler can handle them. But, the handlers are independent of each other and don't know which options they support. This means, we need centralize all supported options. This means also whenever we are adding a new option, we have to add it to the handler and the parser.

Example is the DB parser and Java parser, the DB parser has no idea what the Java parser supports.


Tags: V2

### Java %scriptclass Option Handling in Design
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in Design?

### Java %scriptclass Option Handling in Design
`dsn~java-scriptclass-option-handling~1`

Implement a function in the `Converter` class which adds the script class option to the JVM Options list.
Copy link
Collaborator

@tkilias tkilias Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Implement a function in the `Converter` class which adds the script class option to the JVM Options list.
Implement a function in the `ConverterV2` class which adds the script class option to the JVM Options list.

?

### Java %jar Option Collection
`dsn~java-jar-option-collection~1`

Implement an algorithm in class `ConverterV2` to split the given Jar option value by colon (':') and then collect the found Jar options in a list.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add our links about what are legal classpaths

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants